Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available December 1, 2026
- 
            Ingvarsson, P (Ed.)Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species.more » « lessFree, publicly-accessible full text available May 30, 2026
- 
            Wilson, Melissa (Ed.)Abstract Horseshoe crabs, considered living fossils with a stable morphotype spanning ∼445 million years, are evolutionarily, ecologically, and biomedically important species experiencing rapid population decline. Of the four extant species of horseshoe crabs, the Atlantic horseshoe crab, Limulus polyphemus, has become an essential component of the modern medicine toolkit. Here, we present the first chromosome-level genome assembly, and the most contiguous and complete assembly to date, for L. polyphemus using nanopore long-read sequencing and chromatin conformation analysis. We find support for three horseshoe crab-specific whole-genome duplications, but none shared with Arachnopulmonata (spiders and scorpions). Moreover, we discovered tandem duplicates of endotoxin detection pathway components Factors C and G, identify candidate centromeres consisting of Gypsy retroelements, and classify the ZW sex chromosome system for this species and a sister taxon, Carcinoscorpius rotundicauda. Finally, we revealed this species has been experiencing a steep population decline over the last 5 million years, highlighting the need for international conservation interventions and fisheries-based management for this critical species.more » « lessFree, publicly-accessible full text available February 1, 2026
- 
            Fraser, Bonnie (Ed.)Abstract Over 400 million years old, scorpions represent an ancient group of arachnids and one of the first animals to adapt to life on land. Presently, the lack of available genomes within scorpions hinders research on their evolution. This study leverages ultralong nanopore sequencing and Pore-C to generate the first chromosome-level assembly and annotation for the desert hairy scorpion, Hadrurus arizonensis. The assembled genome is 2.23 Gb in size with an N50 of 280 Mb. Pore-C scaffolding reoriented 99.6% of bases into nine chromosomes and BUSCO identified 998 (98.6%) complete arthropod single copy orthologs. Repetitive elements represent 54.69% of the assembled bases, including 872,874 (29.39%) LINE elements. A total of 18,996 protein-coding genes and 75,256 transcripts were predicted, and extracted protein sequences yielded a BUSCO score of 97.2%. This is the first genome assembled and annotated within the family Hadruridae, representing a crucial resource for closing gaps in genomic knowledge of scorpions, resolving arachnid phylogeny, and advancing studies in comparative and functional genomics.more » « less
- 
            Moura, Mario R. (Ed.)Projecting ecological and evolutionary responses to variable and changing environments is central to anticipating and managing impacts to biodiversity and ecosystems. Current modeling approaches are largely phenomenological and often fail to accurately project responses due to numerous biological processes at multiple levels of biological organization responding to environmental variation at varied spatial and temporal scales. Limited mechanistic understanding of organismal responses to environmental variability and extremes also restricts predictive capacity. We outline a strategy for identifying and modeling the key organismal mechanisms across levels of biological organization that mediate ecological and evolutionary responses to environmental variation. A central component of this strategy is quantifying timescales and magnitudes of climatic variability and how organisms experience them. We highlight recent empirical research that builds this information and suggest how to design future experiments that can produce more generalizable principles. We discuss how to create biologically informed projections in a feasible way by combining statistical and mechanistic approaches. Predictions will inform both fundamental and practical questions at the interface of ecology, evolution, and Earth science such as how organisms experience, adapt to, and respond to environmental variation at multiple hierarchical spatial and temporal scales.more » « less
- 
            Welcome to the big leaves: Best practices for improving genome annotation in non‐model plant genomesAbstract PremiseRobust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein‐coding gene predictions. MethodsThe impact of repeat masking, long‐read and short‐read inputs, and de novo and genome‐guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. ResultsBenchmarks that reflect gene structures, reciprocal similarity search alignments, and mono‐exonic/multi‐exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA‐read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence‐based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome‐guided transcriptome assemblies, or full‐length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post‐processing with functional and structural filters is highly recommended. DiscussionWhile the annotation of non‐model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.more » « less
- 
            Abstract Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.more » « less
- 
            Abstract BackgroundDe novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species. FindingsUsing Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements. ConclusionsKnowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.more » « less
- 
            Holland, J. (Ed.)Abstract Douglas-fir (Pseudotsuga menziesii) is native to western North America. It grows in a wide range of environmental conditions and is an important timber tree. Although there are several studies on the gene expression responses of Douglas-fir to abiotic cues, the absence of high-quality transcriptome and genome data is a barrier to further investigation. Like for most conifers, the available transcriptome and genome reference dataset for Douglas-fir remains fragmented and requires refinement. We aimed to generate a highly accurate, and complete reference transcriptome and genome annotation. We deep-sequenced the transcriptome of Douglas-fir needles from seedlings that were grown under nonstress control conditions or a combination of heat and drought stress conditions using long-read (LR) and short-read (SR) sequencing platforms. We used 2 computational approaches, namely de novo and genome-guided LR transcriptome assembly. Using the LR de novo assembly, we identified 1.3X more high-quality transcripts, 1.85X more “complete” genes, and 2.7X more functionally annotated genes compared to the genome-guided assembly approach. We predicted 666 long noncoding RNAs and 12,778 unique protein-coding transcripts including 2,016 putative transcription factors. We leveraged the LR de novo assembled transcriptome with paired-end SR and a published single-end SR transcriptome to generate an improved genome annotation. This was conducted with BRAKER2 and refined based on functional annotation, repetitive content, and transcriptome alignment. This high-quality genome annotation has 51,419 unique gene models derived from 322,631 initial predictions. Overall, our informatics approach provides a new reference Douglas-fir transcriptome assembly and genome annotation with considerably improved completeness and functional annotation.more » « less
- 
            Abstract With the advent of affordable and more accurate third-generation sequencing technologies, and the associated bioinformatic tools, it is now possible to sequence, assemble, and annotate more species of conservation concern than ever before. Juglans cinerea, commonly known as butternut or white walnut, is a member of the walnut family, native to the Eastern United States and Southeastern Canada. The species is currently listed as Endangered on the IUCN Red List due to decline from an invasive fungus known as Ophiognomonia clavigignenti-juglandacearum (Oc-j) that causes butternut canker. Oc-j creates visible sores on the trunks of the tree which essentially starves and slowly kills the tree. Natural resistance to this pathogen is rare. Conserving butternut is of utmost priority due to its critical ecosystem role and cultural significance. As part of an integrated undergraduate and graduate student training program in biodiversity and conservation genomics, the first reference genome for Juglans cinerea is described here. This chromosome-scale 539 Mb assembly was generated from over 100 × coverage of Oxford Nanopore long reads and scaffolded with the Juglans mandshurica genome. Scaffolding with a closely related species oriented and ordered the sequences in a manner more representative of the structure of the genome without altering the sequence. Comparisons with sequenced Juglandaceae revealed high levels of synteny and further supported J. cinerea's recent phylogenetic placement. Comparative assessment of gene family evolution revealed a significant number of contracting families, including several associated with biotic stress response.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
